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Introduction 

This paper describes a computer program which accepts and "understands 11 
a comfortable s bat restricted aet of one natural language, English. Certain 
difficulties are inherent in th:s problem of making a machine "understand" 
English*. Within the limited framework of the subject mattar understood by 
the program, many of theses problems era scivid or circumvented. I shall des- 
cribe these problems and my solutions; and point out those solutions which 
1 feel have general applicability, S will aiso indicate which must be 
replaced by more general methods to be really useful , and give my ideas about 
what general solutions to these particular problems might entail. 

J. shall not bore the reader at this point with a diatribe on why one 
would want to communicate to the computer in English, Suffice it to say 
that 200 million English speaking pacple caa^t be all wrong=»-and if they 
could speak to a computer they might even be right more often* Man's 
ability to use symbols and language is a prime factor in his intelligence, 
and when we learn how to make a computer understand any natural language, 
ve will have taken a large seep toward creating an "artificially intelligent" 
machine This is not to say that using "natural language" is necessary; one 
might do even better to make people change to soma more "intelligent" language. 



*To avoid excessive circumlocutions, T, shall henceforth use Just "English" 
instead of the hedging phrase "restricted subnet of English", and use "undor~ 
stand" only in the sense defined below . 
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The question naturally arises, "What dc you mean by having a computer 
understand natural language!" X have adopted the following operational 
definition of understanding. A computer understands a subset of English 
if It will accept input sentences which are members of this subset, and 
correctly answer questions based or, information contained in these sentences 
This ability meat extend to deductions bated on iissplicit information con- 
tained in several sentences, IS is desirable that the anwera also be 
in English to facilitate communication between the computer and a person. 

Vie thus define "understanding" in terms of statements In English. The 
computer must accept them as Input, and answer certain queries about them. Eow 
should the computer store the Information contained in there statements* 
}:f each sentence could be stored unchanged, no information would be loatj 
but this would pr*t a tremendous burden on the questicn-ensvering portion 
of the progracc The question answerer would have to find all relevant sen- 
tences, extract the "meaning" pertinent tc> the question asked, and perform 
those deductions and manipulations necessary to find the answer to the 
qvestioa asked. For t large corpus, sorting out the relevant material 
would be a very costly task 

One way oi easing the burd-sn of sorting is to create an index for the 
input corpuSc However, unless "meaning" is first extracted from the sentences, 
the index must be based upon the words in a sentence* The value of the index 
is then somewhat denigrated by the problems of synonotsy and homography. 

In a. general quest ioa~answering system, each type of question may 
require that meaning be extracted in a different way for convenient mani- 
pulation and deduction. Deductive techniques may differ depending on the 
type of question and the information available in the corpus. To simplify 
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such problems of sorting for relevant material, extracting leaning, Slid 
making Inferences, 55e is driven to Select t "point 8f Vleu", aBu dt limit 

Ull tjpi Qf questions gmra|r§b]j by a rjitm* 

m Smli it 1 WKiM l^ifiig IflM # » poitf of view wi fti 

SAD SAM program written' by Robert Lindsay at Carnegie Tech in 1960. It 
accepted as input most sentences which coulc! be written in Basic English 
.(a subset of English, designed by C.» K> Ogden, which contains a vocabulary 
of about 1500 words). The questions which it answers are concerned with 

family relationships between individuals, !*£., "IS I'Offl the bf8th§F Of 

Earyf" or "Who are Jack's grandchildren?" SAD SAK extracts, the meaning 
of a sentences relevant to family relationship* and stores only that informs- 
tiono Thus from the sentence, "Mary, Tern's sister, went to the meeting," 
the information about where Mary went would be discarded „ The program stores 
in a family tree type of representation , the information about Mary and 
Tom^s relationship, l«e , it makes them both children of the same (as yet 
unspecified) pair of parents. The family tree grows as more information 
is added to the> system* To answer a question concerning the relationship 
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course. An example is; 

"The sum of two numbers is 96, and one of the numbers is 16 

larger than the other number* Find the two numbers c" 
Exactly this statement of this problem h&a been accepted by the STPBEFI 
program and the following solution printed cut; 

"One of the numbers is 56" 

5! ThQ ette numbor is 40" 

The details of hot* this is accomplished will be di3Cucssd below* 

I choso this problem context for s number of reasons.. First thers is 
a good form ir. which to store this type of information for later manipula- 
tion, namely as algebraic equations. Secondly, 3: felt that there was a 
manageable subset of English in which many of these problea,s would be 
expressible, and that this subset could be expanded incrementally. Fin- 
ally, there ere a large number of "algebra story problem" available in 
first year high school terrt becks. 

Since the entire process fro.?, input {, recessing to question answering was 
programmed, a erasure of comparison with human performance if. available, to 
e«~* ♦•»,* STimKttT iirotrram answers most questions that it can handle as fas. 



V 
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the typa of information expected. The information storage structure used in 
STUDENT falls into the category of what 1 call relational models. 

A rej;a^ocal_sodel is defined by three things i a set of objects , the 
relationships between these objects expressible in the model, and a language 
media for exhibiting the relationships that exist between particular objects. 
A relational model is useful for a question answering system if there exist 
techniques which can take advantage of the relational language to find 
implied, but not necessarily explicitly stated, relationships between objects 

of the model- 

Undsa^o program, for answering questions about family relationships 
uses such s relational model, The objects in the model are people, and 
the basic relationship us;d in the model ..e the parent-child relationship . 
The media used for expressing the relationship between individuals is a 
tree of nodes, with codas representing individuals, and directed branches 
representing the parent -child relationship. This model is useful because 
all other family relationships can be defined in terms of this one basic 
relationship, end questions about the relationship between two individuals 
csn be answered on the basis of a coraputaticn on the path connecting these 
two individuals in the family treet 

The STREET question answering system also uses a relational model. 
The objects ir. the model are words and phrases "naraing'' numbers, or numbers 
with units attached. X call these objects "variables", The basic relation- 
3uip§ SIS 5h§ S?£thme$ic relations of sum, difference, product, quotient, 
exponentiation and equality. The media for expressing the relationships 
between objects is a set of equations. The model is useful because well 
defined techniques exist for finding numerical values which satisfy sets 
of. simultaneous equations . Thus the system can answer questions about the 
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value of a number named by a given phrase, although this value is given only 
illicitly in the relationships stated in the problem, 

Por any relational model used in an "English language" question anevcr- 
ing system, there are two important considerations-- The first is how esc 
the English input be transformed into the relational language media , exic 
tha second is what are good deductive techniques for using the model to 
solve problems. For algebra story problems a 3 ced general :.*ormat for a 
relational model was known, based on sets of simultaneous equations. The 
implementation within the STOSEKX program of the transformation and solu- 
tion procedures, based on this general mod-sl - are discussed below. 

Th e Rotation Used in STUPES! :s_ Reiafc^aaJ^Mooel 

Ihe relational model in the STUDiSHT system uses a set: of algebraic 
equations to represent the arithmetic relationships expressed in the English 
input. These equations are expressed in a parenthesised pro fix notation 
rather than the conventional infix notation. Por example; the conventional 
infix notation expression;, B 4- C, is written CPU'S B C). 

In general, in this prefix notation, the name of the arithmetic function 
used is made the first element of a list, and succeeding list elements are 
the variables which are the argraents of that function. The exact notation 
used is given in Figure 1 below. Rote that "minus" is a unary minus, and 
that the usual binary subtraction operator is a composite relation in the model. 
In addition, "plus" and "times" are not strictly binary. Indeed, in the 
model they may liave an indefinite number or arguments, e.g., (TIMES ,A,B S C) 
is a legitimate prefix notation expression in the STUDENT model. 
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One ration 

Equality 

Addition 

negation 

Subtraction 

Multiplication 

Division 

Exponent iat i on 



Infta R otation 

A - B 
At8 

-A 
A =- B 
A * B 
A / B 



E 



A' 



Figure 5 



Pref ix Rotation 

(EQUAL A B> 
(PUIS A B> 
(MINUS At 

(PUIS A CKXffliS B» 
(TOSS A B) 
(QfJOTZEKT A B) 
(EXFX A B> 



The use of a fully parenthesised notatioa such as this circumvents the 
problwa of amblsuity in the crde- in which op»rctions occur* Xn the expres* 
sion A •* B * C in unparenthesized infix notation, it is unclear whether A i* 
to be added to B and the sum multiplied by C, or if the product of B and C; 
is to be added to A, Cnc* solution to this ambiguity is to give each opera 
tioa £ relative precedence, and operations of higher precedeace are assumed 
to be performed first . Such a precedence scheme is assumed by STVBmf in 
determining an interpretation of similar ambiguous English expressions. 
Once inside the model, however, arithmetic operations and arguments are ffuUy 
paranthesizedj and therefore, order of operations is unambiguous. 



Outline of the C fperatloa,_of_sa^ajy 

The first atep in the operation of the St^'DEKt question answering systw- 
is to provide the STUDBlfjt "progran" with some general information which it 
will "remember" and use, if relevant, in all problems it is asked to do. 
Through a program called REMEMBER, this information becomes part of the p&r 
manent store of knowledge in the system and is what I call global information* 
.>■-!•■ examples of global facts which have been given to the S'n>T)E??I :.„. >c--~ « l ; 
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"12 inches equals 1 foot" 

or 

"twice always means 2 times" 

How these fact. ere used will be deeeribed later. 

a „ OTB BHT is asked to solve e partieular proble*. To do «, 

KUtlml —I. end then *anipuletes *J~» in the e»del to find the 
„ Mote speeifieally, **«* transfer the Snslish input into a 

., e , n4ne a list of what answers are required, 
s-t of simultaneous equations, keeping a 

, , , eA f. « dollars, pound.) and a list of all the 
a list of the units involved ie, go, done 

fables in the options. Then ««« i— - - «» — '"^ 

thls 8 et of eouations for the debited unknc-ns. « a solution is found, 

, .„. values o< the unknowns requested in the fonaat il«- 
STOBEW prints the value!, o. tne ,„i^e 

_d earner, i.e., substitute in "^bU U H«" - - - 

prases for „«. - *» « ' ~~ — * -f^ 
eurlsttos are used to identify two variables <i.e., find t, ^ - 

*u e i- w fer to the same numocssj = » tw 
ferent phrases that refer to 

4.--^ *.r the <•»'■ of equations., Keie. 
EK identified the equation A - S is adoed te the . 

, * ».•„„ t- / nd any relevant aquc 
euee is aade to the store of global lnfo«,Uon t. 

""..suctions .ode about *a identity of «x..». - - >°^ 
Levant stored equations retrieved by ROT. —ted out, Xf use 

« 8 u« ie printed out in the format desoribe, .hove. 

« a solution was not found, and eertai, idioms are present 
E u B lish statement of the problem then a sub.itution is -. - eaeh c. 
A08e Idio^ in turn, and the transforation and solution prooesses are 
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repeated. If it is unsuccessful with all single substitutions STUDKRX 
reports the failure and terminates. If the problem is ever solved, the 
solution is printed and the program terminates, 

■Preformation of the BaaU«h toeaLJaLlhOglaaSB^Jfeasl 

The vords and phrases (strings of words) in the English input can be 
classified into three distinct categories on the basis of how they are han- 
dled in the transformation. The first category consists of strings of vords 
which denote objects in the model; I call such strings, variables. Vari- 
ables are identified only by the string of words in them, sad if two strings 
differ at all, they define distinct variables. One important problem con- 
sidered below is how to determine when two distinct variables refer to the 

same object c 

the second class of words and phrases are what S call "substitutors" . 
Each substitutor may be replaced by another string. Some substitutions cr* 
mandatory; others are optional and are only aade if the problem cannot be 
solved without such substitutions. An example of a mandatory substitution 
i8 "2 times" for the word "twice". "Twice" always means "2 times" in the 
context of the model, and therefore this substitution is always made. One 
optional "idiomatic" substitution is "twice the sum of the length and width 
of the rectangle" for "the perimeter of the rectangle". The use of these 

substitutions in the transformation process is discussed below. 

Members of the third class of words and phrases indicste the relation" 
ships between the objects in the model, i.e., the variables in the problem, 
I call members of this third class "operators". Operators may indicate 
operations which are complex combinations of the basic relationships. One 
simple operator is the word%lus* which indicates the operation of addicioa 
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A complex operator is the phrase "percent leas than", as in "10 percent 
less than the narked price", xrbich locates the nuaber' Immediately preceding 
"percent", subtracts it from 100, divides this result by 103, and then 
multiplies this quotient by the variable following the "than". 

The class of operators may be further subdivided according to where 
the arguments of the operators c-e found A prefix operator, such as 
"the square of..." precedss Its argument . An operator like "... percent" 
is a suffix operator, and follows its argument. Infix operators such as 
" 0O c plus o.c" or ".», less than ...'■ appear between their two arguments. 
In a split prefix operator such as "difference between ... and ,.,", part of 
the operator precedes, and part appears between the two arguments. "The sum 
of ..* and ... and ...•» is a split prefix operator with an indefinite nuin* 
ber cf arguments:, 

Some words may conditionally act as operators, depending on their con- 
text. For example, "of is only equivalent l;o "times" if there is a number 
immediately preceding it; e.g. t ".5 of the profit- is equivalent to ".5 times 
the profit"; however, ''Queen of Sngland :s doeu not imply a maltiplicative 
relationship between the Queen and her country. 

Let us now consider in detail the transformation procedure used by 
STUDEEX and see how these different types of phrases interact* To make the 
process more concrete, let us consider the following example which has been 

solved by STUDENT, 



(THE PRQBLEK TO BE SOLVED IS) 

(IF THE NUMBER 0? CUSTOMERS TOM GETS JS TW>:CE THE SQUARE .OF 20 PER 
gERF OF THE HUKBER OF ADVEPTlSEMEiKS HE KBS. AND THE DUMBER OF 
ADVERTISEHEKTS HE BURS IS 45 , WHAT XS THE NUMBER OF CUSTOMERS TOM 

GETS Q.) 

F-gurr 2. 



V 
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Ihis text is a copy of actual printout from the program, showing stages 
in the transformation and the solution of the problem,, The parentheses are 
an artifact of the LISP programming language, and "Qo", is a replacement for 
the question mark not available on the key punch. 

The first stage in the transformation is to perform all mandatory sub- 
stitutions . In this problem, only the three phrases underlined (single 
words are one word phrases) are substitutorss "twice" becomes "2 times", 
"per cent" becomes the single word "percent", and "square of" is truncated 
to "square" . Having made these substitutions, STUDENT prints? 



(WITH MANDATORY SUBSTITUTIONS TrlE PR0BL1M IS) 

(IF THE NUMBER OF CUSTOMERS TOM* GETS US £JESI££ THE SQUARE 2° SS^^I 
OF THE NUMBER OF ADVERTISEMENTS HE Rl^S , AND THE NUMBER OF ADVERTISE- 
MENTS HE Rt?NS IS 45, WHAT XS THE NIMBSR OF CUSTCMERS TCM GETS Q.) 

Figure 3 



Using dictionary entries for each word, the words in the problem are 
now tagged by their function in terms of tie transformation process, and 
STUDEHT prints 3 

(WITH WORDS TAGGED W FUNCTION THE PROBLEM IS) 

(IF THE NUMBER (OF / OF) CUSTOMERS TOM (GETS / VERB) IS 2 (TIMES ( OP 1) 

THE (SQUARE / OP 1) 20 (PERCENT / OP 2) (OF / OP) THE NUMBER (OF / OP) 
ADVERTISEMENTS (HE / PRO) RUNS, AND THE NUMBER (OF / OP) ADVERTISEMENTS 
(HE / PRO) RUNS IS 45 , (WHAT / QWORD) IS THE rUMBER (OF / OP) CUSTOMERS 
TOM (GETS / VERB) (QKARK / DLM)) 

Figure 4 
If a word has a tag, or tags, the word followed by "/", followed by the t&r,s , 



( 
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becomes a single unit, and is enclosed in parentheses. Typical laggings 
are indicated in Figure 4 "<OF / OP)" indicates that "of" is an operator 
and other taggings show that "gets" is a ver'j, "times" is en operator of 
level 1 <operators levels will be explained below), "square is an opera- 
tor of level 1, "percent" is an operator cf level 2, "he" is a pronoun, 
"what" is a question word, and "QHAHX" ^replacing Q.) is a delimiter of a 
sentence. These tagged words will play the principal role in the regain- 
ing transformation to the set of equations illicit in this problem state- 
ment. 

The next stage in the transformation is to break the input sentences 
into "simple sentences". As in the example, a problem may be stated using 
sentences of great grammatical complexity; bit the final stage of the trans- 
formation is only defined on a set of simple sentences. The method adopt e-J 
here to perform this analysis is ad hoc ard primitive, but works reasonably 
well because of the limited number of ways in which algebra story problems 
are expressed. This problem of extraction of simple understandable sentence: 
occurs in any general language processor. 

The simplification method employed in STUDEST depends on the recursive 
use of format matching. Xf an input sentence is of the form "if" followed 
by a substring, followed by a comma, a question word and a second substring 
(i.e., matches the COKTf left half IP * $ + , + $1/QW0RB ■*■ $ -> then the 
first substring (between the TB and the comma) is made an independent sen- 
tence, and everything following the comma is made a second sentence. In 
the example, this means that the input is resolved into the two sentences, 
(where tags are omitted for the sake of brevity); 

"The number of customers Tom gets is 2 times the square 20 percent of 
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the number of advertisements he rune, and the number of advertisements 
he runs is 45 „" and "What i3 the number of customers Tom gets?" 
This last procedure effectively resolves a problem into declarative 
assumptions and a question sentence* A second complexity resolved by 
STUDENT is illustrated in the first sentence of this pair- A coordinate 
sentence consisting of two sentences joined by a coram* immediately fol= 
lowed by an "and" (i.e*, any sentence matching the COSES' left half 
$ 4> , 4' AND * $> will be resolved into the two independent sentences <. The 
first sentence above is therefore resolved into two simpler sentences. 

Using the3e two ad hoc format simplif icatioos , the problem statement 
is put into canonically 'simple" sentences , For the e:;aiaple, STUDENT prints 

(THE SIMPLE SENTENCES ARE) 

(THE NUMBER (0? / OP) CUSTOMERS TOM «SfflCS / VERB) IS 2 {TIMES / OP 1 • 
THE (SQUABE / OP 1) 20 (PERCENT / OP 2» (OF / OP} THE NUMBER (OF / 0?) 
ADVERTISEMENTS CHE / PRO) RUNS (PERIOD / DIM}) 

CTHE NUMBER £0F / OP) ADVERTISEMENTS <HB / PRO) RUNS IS 45 

(PERIOD / Mil)) 

((WHAT / QWORD) IS THE NUMBER f'OF / OP? CUSTOMERS TOM {GETS / VERB) 
CQMARK f W*0) 

Figure 5 

Each simple sentence is a separate list, !.«,, is enclosed in parentheses, 
and each ends with a delimiter (a period or question mark). Each of these 
sentences can now be transformed directly to its interpretation in the model- 
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niced by the STt ? D£NI program* Hew operators can easily be added to the 
program equivalent of this table. 

In performing the transformation of a phrase P* a left to right search 
is made for an operator of level 2 (Indicate J by subscripts of "OP" and 2): 
If none is found, a left to right search is made for a level 1 operator 
(indicated by subscripts "OP" and I), and finally another left to right, 
search for an operator of level (indicated by a subscript "OP" and no 
numerical subscript), If an operator is fouad, this operator and its con- 
text are transformed as indicated in coluoa 4 in the tabic If no opera- 
tor is present, delimiters and articles («» an and the) are deleted and 
the phrase is treated as an indivisible eotity, a variable 

la the example, the first simple sentence is 

CZEB HUMBER (OF/OP) CUS3QI0SRS TOH (GETS /VERB) IS 2 (TIMES /OP 1> TEE 
{SQUARE /OF I) 20 {PBRCBm /OP 1} (0*70?) THS NUMBER <OF,OP) 
ABVBRTISEMEPTS (HE/PRO) RUHS (P£R£OD/DUf» 

This is of the form "PI is P2", and is transformed to (BQVaL PI* P2*>. 

PI is "(THE NOMBBR (OF/0?) CUSTOMERS TOM (GETS/VERB))" . The occurrence 

of the verb "gets" is ignored because of the presence of the "is" in the 

sentence, meaning "equals". Toe only operator found is "(OP/OP)"* Prom 

the table we see that if "of" is immediate?, y preceded by a number (not the 

word "number") it is treated as if it were the infix "times" „ In this case, 

however, "of" is not preceded by a number, the subscript OP indicating thas 

"or* is an operator is stripped away, and th* transformation process is 

repeated on the phrase with "of" no longer acting as an operator In this 

repetition, no operators are found, and Pi* is the variable 

(HUMBER OP CUSTOMERS TOM (GETS /VERB})* 

To the right of "is" in the sentence is P2s 

(2 (TIKES /OP 1} THE (SQUARE/OP 1} 20 (PE2CEKT/0P 2) fGF.'O?) TUT. fflJMfc-E.1 
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(OP/OP) ADVERTISEMENTS CHS/PRO) RU?S CTEEIOD/DIM)) 



Ocerat^BjLn_ST!?P|K 



Operator 



PUJS 

PLTJSS 

MOWS 

KjOTSS 

TiKES 

DxVBY 

SQUARE 

SQUARES 

LESSTHAS 
PER 

PERCEtu 
PER13SS 



Precedence Context 
Level 



2 

2 



1 

.1 

1 





2 



2 

2 





DIFFERED 
OF 



PI PLUS P2 
PI PLtfSS P2 

Pi mms P2 

MIKUS ?2 

PI HQStfSS P2 

Pi TIKES P2 

pi vvmf P2 

SQUARE PI 
Pi SQCARED 

PI && p2 

Pi USSSTHaN P2 
Pi PER K. P2 
PL PER ?2 

PI K PERCENT P2 

Pi K PERLSSS P2 

sm pi asd P2 amp p: 
ssm pi ahd P2 

DIFPEREKCE BETWEEN PI A»D P2 

K 0? ?2 

pi op pr: 



(b) 



CO 



Transformation to 
Interpretation in 
the Modal (a) 

CPWS,P1*,P2*> 
<PUS,P1*.P2*) 
(PUBS, PI*, COOTS P2*)) 

quhds ?2*> 

CPUJS PI* t (MINUS P2*)> 
CTIMES Pi* P2*> 

foaotanw.pl* ra*i 

CBDPT PI* 2) 

CEXPT PI* 2. $ 

CESPT PI* P2*> 

(PUSS P2* (MXRT8 Pl*» 
(QOOtrKSf PI* <K P2>«> 

fQ9©ri*R3 pi*- a »)*") 
<pigc.'ioo> P2>* 

CPiai00-K)/100; ?2)v 
(PLUS PI* (SUM P2 AN£ P3)*> (c) 
(PUfS PI* P2*> 
CPUJS PI* CMB^S P2< )» 

(TKMSS K P2^> 
CP1 OF P2)* 



Cd> 



ta) If PI is a phrase, Pi* indicates its interpretation i« the models 

(b) Hhen two possible contexts are indicated, they are checked in the 
order shown, 

(c) SQUARE PI and Slid Pi ere idiomatic shortenings of SQUARE OF Pi and 
SUM OF PL** 

<o) * outside a parenthesised expression indicates that the entire phrase 
enclosed is to be transformed, 

(e) K is a number « 

it , ; aad - i U pi v -hat the indicated arithmetic operations are actually 
p? r*or»red 

TV,- ? 
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Tie first operator Pound In P2 is PERCEFX, an operator at level 2= 
From the table in Figure 6, we see that this operator has the effect of 
dividiig the number immediately preceding it by 100, The "PERCENT" is 
remove 1 and the transformation is repeated on the remaining phrase- tn 
the example, the ",.,20 {PERCENT/OP 2> <OF|OP) . - ." becomes 

».., o>000 (OF/CP) «.e"c 

Cmtinuing the transformation, the operators found are, in order, 
TIMES, SQUARE, OF and OF, Each is handled as indicated in the table, The 
"of" la the context "... .2000 COF/OP> THE ..." is treated an infix TIMES, 
while it the other occurrence of "0?"» the operator markiiig is removed e 
The rejulting transformed expression for P2 iss 
(TJMBS 2 C^** C^»(ES ,2 (R0KSBR OF ABVEmSEMBNTS (BE/PRO) RUNS)} 2)) 

Tie transformation of the second sentence of the example is done in 
& similar maimer, and yields th<3 equations 

(2QUAL (NUMBER OF ADVEmSEKENTS (HE/PRO) RUNS) 45) 

Tie third sentence is of the form "What i3 PI?", It starts with a 
quest! >n word and is therefore treated specially,, A unique variable, a 
single word consisting of an X followed by five integers, is created, and 
the eqiation (EQUAL Xnnnnn PI*] is stored* For this example, the variable 
X00301 was cxeated, and this last simple sentence is transformed to the 
equations 

(JQUAL X00001 (NUMBER OF CUSTOMERS TOM (GETS /VERB)) 
In addition, the created variable is placed on the list of variables for 
which STUDENT is to find a value. Also, this variable is stored, paired 
with PL, the untrans formed right side, for use in printing out the answer. 
If a value is found for this variable, STUDENT prints the sentence (PI is 



1 
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jylua ) with the appropriate substitution for v alue « Figure 7 shows the 

full set of equations, and the printed solution given by STUDENT for the 

example being considered. For ease in solution, the last equations created 

are the first in thi3 lint cf equations. 

(1SE EQUATIOKS TO BE SOLVED ARE) 

(EQUAL X00001 (NUMBER 0? CUSTOMERS TOM (0RFS / VERB))) 

(EQUAL (NUMBER OF AWmiSEHEOTS (HE / PRO) RUNS) 45) 

(EQl&L (NUMBER 0? CBSTOKEKS TOM (fSSSSS / VERB)) (TIMES 2 (KPJ 
(TIKES c2000 (NUMBER OF ADVEHTISEMEHTS (HE / PRO) WHS)) 2)» 

(13E NUHBEE OF CUSTOMERS SOH GETS ES 162) 

Figure 7 

In the axample just shown, the equatity relation was indicated by the 

copula "i.s"« In the problais solved by STJUEHT shown in Figure 8 below, 

equality is indicated by the oceurrense o;: e transitive verb in the proper 

context . 

(THE PROBLEM TO BE SOLVED IS) 

CT*i HAS TWICE AS KtiK K3R AS MAEt HAS GUPPIES . IF HAS? HAS 

3 3UJFXES , WHAT iS THE NUMBER OF FiSE TOM HAS Q.) 

(13E EQflATIOHS TO BE SOLVED ARE) 

(EQUAL XOnTOl (HUI3ER OF F£SH TOM (HAS / VERB))) 

(EQUAL (NUMBER OF (SUPPIES (MABK / PERSON) (HAS / VERB!.) 3) 

(EgUAL (NUMBER OF FISH TOM (HAS / VERB)} (SIXES 2 (NUMBER 0? 
GtPPXES (MARY / PERSON} (HAS / VERB)))) 

(IEE NUMBER OF FISH TOM EAS IS 6) 

Figure 3 

The verb in this case is "has"* The simple sentence "Mary has 3 guppies" 
is transformed to the "equivalent" sentence "The number of guppies Hary 
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has is !" and the processing of this latter sentence is done as previously 

discussed. 

The general format for this type of sentence, and the format of tha 
internet late sentence to Oiich it is transformed is best expressed by the 
followiig COMn trans format ion rules 

$ 4- $1/VERB + $1/NUMBER * S « THE + NUK3BR + OP + 4+1 + 2 + IS +3 
This mav be read as-anything <a subject) followed by a verb followed by 
a number followed by anything (the unit) is transformed to a sentence 
starting with "THE HUMBKR OF" followed by the unit, followed by the subject 
and the verb, followed by "IS" and then the number. In "Mary has 3 guppies" 
the sub.ect is "Mary", the verb "has", and the units "guppies" . Similarly, 
the sentence "The witches of Firth brew 3 ma: 5 ic potions'* would be trans- 
formed 10 

"The number of magic potions the witches of Firth brew is 3o" 
Xn addition to a declaration of number, t» single object transitive 
verbs may be used in a comparative structure, such as exhibited in the 
Sentence "Tom has twice as many fish ae Mary has guppies." The COHXT 
rule vh..ch gives the effective transformation for this type of sentence 

structure is 1 

$ - $1/VERB + $ + AS + MANX + $ + AS + $ + $1/VEBB + $ - 

TEB + NUMBER + Of + 6 + 1 + 2 + IS + 3 + THE + NUMBER + OF 
+ 10 + 8 + 9 

For the example, the transformed sentence is: 

"The number of fish Tom has is twice the number of guppies Mary has," 
Transformation of new sentence formats to formats previously "under- 
stood" «y the program can be easily added to the program, thus extending 

etc sublet of English "understood" by STUDENT. In the processing ihat 
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actually takes place within STUDENT the intermediate sentence never exists. 
It is ensier to go directly to the model iTara. the format, utilizing sub- 
routines; previously defined in terms of the semantics of the model. 

Thi: word "is" indicates equality onlj if it is not used as an auxili- 
ary. The example in Figure 9 shows how verbal phrases containing "is", such 

as "is taiitiplied by", and "is increased by' : are handled in the transfcrma- 

* 

tion. 

CHE PK0BU24 TO BE SOLVE® IS) 

(A hoheer is multiple m 5 . this product is ihcreasbd by 44 . 
m:s result is 68 . find tm number .) 

(TIE EQUATIONS TO BE SOLVED ARE) 
WP&L X00001 {NUMBER).* 
(E(;UAL (PLUS (TIMES (NUMBER) 6 > 44} 6S> 
(TIB NUMBER iS 4) 

Figure S 

The sentence "A number is multiplied by 6" only indicates that two 
objects in the model ore related multiplicatively, and does rot indicate 
explicitly any equality relation. The interpretation of this sentence in 
the aod« 1 is the prefix nc Cation products 

(T2K2S (HUHBER) 61 
This lat ter phrase is stored in a temporary location for possible later 
reference. In this problem, it is referenced in the next ssntence, with 
the phrt.se "THIS PRODUCT". The important word in this last phrase is 
"THIS"— STUDENT ignoree all words in a variable containing the key word 
"THIS". The last temporarily stored phrase is substituted for the "this" 
variable. Thus, the first three sentences in the problem stated in Figure 
9 yield only one equation, after two substitutions for "this" phrases. Tnt 



V. 
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last sentence "Find the number." is transformed as if it were "What is the 
number Q.", and yields the first equation shown. 

The word "this" may occur in a context where it is not referring to 
a previously stored phrase. In Figure 10 is an example of such a context. 

(THE PROBLEM TO BE SOLVED XS) 

(1HE PRICE OF A RADIO IS 69.70 DOLLARS . IF THIS PRICE IS 

15 PERCENT LESS THAI.; THE MARKED PRICS . FUJD THE MARKED PRICE 

.) 

(THE EQUATIONS TO BE SOLVED ARE) 

<£QUAL X00001 (MARKED PRICE)) 

(EQUAL (PRICE OF RADIO) (TXMES .8499 (>IARK3D PRICE))) 

(IQUAL (PRICE OF RATIO) C^MES 69.70 (DOLLARS))) 

0.HE MARRED PRICE IS 82 DOLLARS) 

Figure 10 

In such contexts , the phrase containing "thts" is replaced by the left~half 
of the last equation created. In the exaapXe, the phrase "this price" is 
replaced by "the price of a radio" =, 

The problem in Figure 10 illustrates two other features of the STUDENT 
prograc o The first is the action of the conplex operator "percent less then". 
It causes the number immediately preceding it, i.e., 15, to be subtracted 
from 100, this result divided by 100, to give -85 C»8499 due to rounding 
errors in conversion). Then this operator becomes the infix operator "TIMES", 
This is as indicated in the table in Figure 6. 

This problem also illustrates how units such as "dollars" are handlec 
by the STUDENT program, Any word which immediately follows a number Is 
labelled as a special type of variable called a unit. A number followed 
by a unit is treated in the equation as a product of the cumber and the 
unit, e.g. , "(TIMES 69.70 (DOLLARS))". Units are treated speciell? in solvlno 
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the set of equations, in that any unit may appear in Cha answer . If the 
value for a variable found by the solver tn the product of a number and 
a unit, STUDENT concatenates the number and unit. For exaraple, the solu- 
tion for "(HARKED PRICE)" in the problem in Figure 10 was CRIMES 82 (DOLLARS)) 
and STUDENT prints out 

"(THE MARKED PRICE IS 82 DOLLARS) 

There is an exception to the fact that any unit xaay appear in the 
answer, as illustrated in Figui-e lie 

(THE PROBIJSM TO BE SOLVED IS) 

(IF 1 SPAR EQUALS 9 INCHES , AND 1 FATHOM EQfcALS 6 FEET , KJW 

MANY SPANS EQUALS 1 FATHCK Q.) 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (TIMES 1 (FATBQMS))) 

(EQUAL (TIMES 1 {FATHOMS )} (TIMES 6 (FEET))) 

(EQUAL (TIMES 1 (SPANS)) (TIMES 9 (IHCba))) 

UNABLE TO SOLVE THIS SSI OF EQUATIONS 

(USING THE FOLLOWING KNOWN RELATIORSHXPS) 

((EQUAL (TIMES 1 (YARDS )> (TXMES 3 (3?EET»> (EQUAL (T»IES 1 

(FEET)) (1'XMES 12 (INCHES)))) 

(1 FATHOM IS 8 SPANS) 

Figure 11 

If, as in the problem in Figure 11, the unit of the answer ia specified— 
by the phrase "how many 3pans"-°then onlg that unit, in this case spans, 
may appear in the answer* Without this restriction, STUDENT would blithely 
answer this problem with "(1 FATHOM IS 1 FATHOM)". 

In the transformation fron the English statement of the problem to 
the equations, 9 inches become (TIMES 9 (aNCBBS)). However, 1 fathom 
became (TIMES 1 (FATHOMS)) * The plural fom for fathom hai been eub3t?> 
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tuted for the singular fcrao STODEM always uses the plural form if known, 
to ensvra that all units appear in only one form. Since "fathom" and 
"fatuous" are different STUDS3T would treat them as distinct, unrelated 
units. She plural form is part of the global information that cao be made 
availatle to STUDENT and the plural form of a wore' is substituted for any 
singular form appearing after "1" in any ?hrase„ Hie inverse operation is 
carrier' out to perform correct printout of the solution . 

Kctice that the information given in the problem was insufficient to 
allow eolution to the set of equations to be solved. Therefore, STUDEKT 
looked in its glossary for infornation concerning each of the units in 
this set of equations., it found the relationship "1 foot equals 12 inches". 
Using this fact, and the equafcloa it implies, STUPENT i3 able to solve the 
probles . Thus, in certain cases where a problem is not "analytic", in the 
sense that it does not contain, explicitly stated, all the information 
needed for its solution, STuDEUT ic able so draw on a body of facts, pick 
out relevant ones, and use these to obtain a solution. 

There is another class of problems which X call semi-analytic „ In 
such problems, the transformation process does not yield a set of solvable 
equations* However, la this set of equations there exists a pair cf vari~ 
ables {or more than one pair) such that the two variables are only "slightly" 
different, and really name the same object: in the model. When a set of 
equations is unsolvable STUD2HT searches for relevant global equations .. 
In addition, it uses several heuristic techniques for identifying two slightly 
different variables in the equations. The problem in Figure 11 illustrates 
identification of two variables where in one variable a pronoun has been 
substituted for a noun phrase in the other variable. This identification 
is made by checking all variables appearing before one containing the pr- 
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noun, aud finding one which is identical to this pi'onoun phrase* with a 

{TM PBDBLEK TO BE SOLVED iS> 

fTiffi NUMBER OF SOLDIERS THE RUSSIANS HAVE IS ONE HALF OF THE 
NUHBER OF GUNS THEY HA^ „ TilB NUM3EF. GF GOSS THEY HA?E IS 
70)0 o WHAT XS TEE NUHBES OF SOLDI ERE THEY HAVE Q,) 

CT;ffi EQUATIONS TO BE SOLVED AREj 

(EQUAL X00001 CNUK3ER OF SOLDIERS (fEE* / PRO) (HAVE / \»ERB)» 

(EQUAL (KCMB2R OF GDHU (JESS / FRO) (HAVE / VERB)) 7000) 

(EQUAL (SUMEER OF SOUXCEBS RUSSIANS (HAVE / VERB)} (TJIIES .5000 
(ttJNBER OF GOBS fSBEX i FRO) (HAVE / VERB)))) 

UNaBLE TO SOLVE THIS SET OF EQUATIONS 

(ASSUMING TEAT) 

CCroMBER OF SOLDHSRS CtEBY / PRO) (HAVE / VERS)) IS EQUAL TO 

CN1MBER OF SOLDIERS XUSSXASS (HAVE / VESB») 

{TIE NUMBER OF SOLDUSiS TffiJ? HAVE IS 3500) 

Figure 12 

substitution of a string any length for tfce pronoun < If t<ao variables 
match ii this fashion, STUDENT assumes the two variables are equal, and 
prints <3ut a statement of this assumption, as shown. The solution pro- 
cedure :".s then tried again, with the additional equations from identifies** 
tions. In the example , the additional equation was sufficient to deter- 
mine th« solution. 

Tha example in Figure 13 is again a "non-analytic" problem. The 
first set of equations developed by STUDEFT is unsolvable. Therefore, 
STUDENT tries to find some relevant equations in its store of global 
information. It uses the first word of each variable string as a key to 
its glossary. The one exception to this rule is that the words "number of" 
asre ignored if they are the fir3t two words of a variable string. Thus, 
in this problem, STUDENT retrieved aquations which were stored under the 
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key words distance, gal Ions , ^§f» and miles. Two facts about distance 
had been stored earlier; "distance equals spaed tiaes time" and "dis- 
tance equals gas consumption times number of gallons of gas used". The 
equations implicit in these sentences were stored and retrieved now — as 
possibly relevant to the uolution. In fact , only the second is relevant 



(THE PROBLEM TO BE SOLVED IS) 

(THE GAS CONSUMPTION OF MY CAR IS 15 MJLES PER GALLON „ THE 
DISTANCE BETWEEN BOSTON AND NEW YORK IS 250 MILES » WHAT IS 
THE NUMBER OF GALLONS OF GAS USED ON A TRIP BETWEEN NEW YORK 
AND BOSTON Qo> 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (NUMBER OF GALLONS OF GAS USED ON TRIP BETWEEN 
NEW YORK AND BOSTON)) 

(EQUAL (DISTANCE BETWEEN BOSTON AND NEW YORK) (TIMES 250 (MILES))) 

(EQUAL (GAS CONSUMPTION OF MY CAR) (QUOTIENT (TIMES 15 (MILES)) 
(TIMES 1 (GALLONS)))) 

UNABLE TO SOLVE THIS SET OF EQUATIONS 

(USING THE FOLLOWING KNOWN RELATIONSHIPS) 

((EQUAL (DISTANCE) (TIMES (SPEED) (TIME))) (EQUAL (DISTANCE) 

(TIMES (GAS CONSUMPTION) (NUMBER OF GALLONS OF GAS USED)))) 

(ASSUMING THAT) 

(CDISTANCE) *S EQUAL TO (DISTANCE BETWEEN BOSTON AND NEW rORK>) 

(ASSUMING THAT) 

((GAS CONSUMPTION) IS EQUAL TO (GAS CONSUMPTION OF MY CAR)) 

(ASSUMING THAT) 

((NUMBER OF GALLONS OF GAS USED) IS EQUAL TO (NUMBER OF GALLONS 

OF GAS USED ON TRIP BETWEEN NEW YORK AND BOSTON)) 

(THE NUMBER OF GALLONS OF GAS US -D ON A TRIP BETWEEN NEW YORK 
AND BOSTON IS 16 = 66. GALLONS) 

Figure 13 



A. I. Memo 66 -26- Memorandum MAC-M- 148 

Before any attempt is made to solve this augmented set of equations, 
the variables of the augmented set are matched, to identify "slightly 
different" variables which refer Co the seme object in the model. In 
this example "(DISTANCE)", »£g_4S COHSIMPTIOH)" and "<NLK3Ea OF GALLONS OF 
GAS CSED)", are all identified with "similar" variables. The following 
heuristically determined conditions must be satisfied for identification 
of variables PI and P2. 

1) PI must appear later in the problem than P2. 

2) PI is completely contained in P2 in the sense that Pi is a con* 
tiguou3 substring within P2. 

This identification reflects a syntastic phenomenon where a truncated 
phrase, with one or more modifying phrases dropped, is often used in place 
of tbe entire phrase. Fcr example, if tha phrase "the length of a rec- 
tangle" has occurred, the phrase "the length 1 ' may be used io mean the same 
thing. This identification is distinct from that made using pronoun sub- 
stitution. 

In the example in Figure 13, a schema is used by identifying the vari- 
ables in the schema with the variables that occur in the problem. This 
problem is solvable exactly because the kay phrases "distance", "gas 
consumption" and "number cf gallons of gas used" occur as substrings of 
the variables in the problem. Since STUD3HT identifies each generic key 
phrase of the schema with a particular variable of the problem, any scheme 
can be used only once in a problem. Because STUDEHT handles schema in this 
ad hoc fashion it cannot solve problems in which a relationship such as 
"distance equals speed times time" is needed for two different values of 
distance, speed, and time. With some effort, this weakness in the prograc 
could be overcome. 
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(THE PROBLEM TO BE SOLVED IS) 

(THE LENGTH OP A RECTANGLE IS 8 INCFES MORE THAN THE WIDTH 

OF THE RECTANGLE . OHE HALF OF TEE FER1KETER OF THE RECTANGLE 

IS 18 INCHES . FIND THE LENGTH AND SHE WIDTH OF THE RECTANGLE 



(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (WIDTH OF RECTANGLE)) 

(EQUAL XC0002 (LENGTH)) 

(EQUAL (TB-ffiS ,5000 (PERIMETER OF RICIASGLE)) (TIMES 18 (INCHES))) 

(EQUAL (LENGTH OF RECTANGLE) (PLUS (TIMES 8 (INCHES)) 0«DTB 
OF RECTANGLE))) 

UNABLE TO SOLVE THIS SET' OF EQUATXOJS 

TRYING POSSIBLE IDIOMS 

(THE PROBLEM WITH AN IDIOMATIC SUBSTITUTION IS) 
(THE LENGTH OF A RECTANGLE IS 8 B5CEES MORE THAN THE WIDTH 
OF THE RECTANGLE . CfflE HALF OF TWXCJ TEE SUM OF THE LENGTH 
AND WIDTH OF THE RECJTAHGLE IS 18 IKCHES . FIND THE LENGTH AND 
THE WIDTH OF TEE RECTANGLE .) 

(THE EQUATIONS TO BU SOLVED ARE) 

(EQUAL X00003 (WIDTH OF RECTANGLE)) 

(EQUAL X00004 (LENGTH)} 

(EQUAL (TIKES (TIMES .5000 2) (PLUS (LENGTH) (WIDTH OF RECTANGLE})) 

(TIMES 18 (INCHES))) 

(EQUAL (LENGTH OF RECTANGLE) (PLUS (TIMES 8 (INCHES)) (WIDTH 
OF RECTANGLE))) 

UNABLE TO SOLVE THIS SET OF EQUATIONS 

(USING THE FOLLOWING KNOWN RELATIONSHIPS) 
((EQUAL (TIMES 1 (FEET)) (TIMES 12 (INCHES)))) 

(ASSUMING THAT) 

((LENGTH) IS EQUAL TO (LENGTH OF RECT&3GLE)) 

(THE LENGTH IS 13 INCHES) 

(THE WIDTH OF THE RECTANGLE IS 5 INCHES) 

Figure 15 
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Special Heuristics 

The methods thus far discussed have been applicable to the entire range 
of algebra problems. However, for special classes of problems, additional 
heuristics may be used which are needed for members of the class, but not 
applicable to other problems. An example is the class of age problems, as 
typified by the problem it. Figure 16. 

Before the age problem heuristics are uaed, a problem must be identi- 
fied as belonging to that class. STUDENT identifies age problems by any 
occurrence of one of the following phrases, "as old as", "years old" and 
"age". This identification is made immediately after all words are looked 
up in the dictionary and tagged by function. After the special heuristics 
are used the modified problem is transformed to equations exactly as stated 
previously. 

(THE PROBLEM TO BE SOLVED IS) 

(BILL S FATHER S UHCIE IS TWICE AS OLD AS BILL S FATHER . 2 
YEARS FROM ROW BILL S FATHER WILL BE 3 TIMES AS OLD AS BILL 
. THE SUM OF THEIR AGES IS 92 . FIND BILL S AGB .) 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X0001 ({BILL / PERSON) S AGE)) 

(EQUAL (PLUS ((BILL / PERSON) S (FATHER / PERSON) S (UNCLE 
/ PERSON) S AGE) (PLUS ((3ILL / PERSON) S (FATHER / PERSON) 
S AGE) ((BILL / PERSON) S AGE))) 92) 

(EQUAL (PLUS ((BILL / PERSON) S (FATHER / PERSON) S AGE) 2) 
(TIMES 3 (PLUS ((BILL / PERSON) S AGE))) 

(BILL S AGE IS 8) 

Figure 16 

The need for special methods for age problems arises because of the 
conventions used for denoting the variables, all of which are ages. The 
word age is usually not used explicitly, but is implicit in such phrases 
as "as old as". People's names are used where their ages are really the 
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illicit variables. In the example, for instance, the phrase "Bill's 
father's uncle" is used to refer to "Bill's father's uncle's age". 

STUDKHS uses a special heuristic to make all these ages explicit. To 
do this, it must know which words are "person words" and therefore, may be 
associated with an age. ?or this problem STUDENT has been told that Bill, 
father, and uncle are person words. They can be seen tagged as such in the 
equations. The "s" following a word is the STUDENT representation for 
possessive, used instead of "apostrophe - s" for progr amm ing convenience. 
STUDENT inserts a "S AGE" after every person word not followed by a "S" 
(because this "S" indicates that the person word is being used in s pos- 
sessive sense, not as an independent age variable). Thus, as indicated, 
the phrase "BILL S FATHER S UNCLE" becomes "BILL S FATHER S UNCLE S AGE". 

In addition to changing phrases naming people to ones naming ages , 
STUDENT makes certain special idiomatic substitutions. For the phrase 
"their ages", STUDENT substitutes all the aga variables encountered in 
the problem. la the example, for "THEIR AGES" STUDENT substitutes "BILL S 
FATHER S UNCLE S ACS AND BILL S FATHER S ASE AND BILL S AGS". The phrases 
"as old as" and "years old" are then deleted as dummy phrases not having 
any meaning, and "will be" and "was" are changed to "IS". There is no 
need to preserve the tense of the copula, since the sense o." the future 
tense is preserved In such prefix phrasc3 as "2 years from now". 

The remaining special age problem heuristics are used to process the 
phrases "in 2 years", "5 years ago" and "now". The phrase "2 years from 
now" is transformed to "in 2 years" before processing. These three time 
phrases may occur iranediately after the word age, (e.g., "Bill's age 3 years 
ago") or at the beginning of the sentence. If a time phrase occurs at the 
at the beginning of the sentence, it implicitly modifies all ages r^rt?cr.id 
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sentence becomes the two sentences? "Mary is twice as old as Ann X00007 
years ago. X0Q007 years ago Mary was as eld as Ann is now." These two 
occurrences of time phrases are handled at discussed previously. Similarly 
the phrase "will be when" would be transformed to "in K years . In K years". 
These decoupling heuristics are useful not only for the ST0DEHT program 
but for people trying to eclve age prcbless. The classic age problem in 
Figure 17 took an MIT graduate student over 5 minutes tc solve because he 
did not know this heuristic. With the heuristic he was able to set up the 
appropriate equations much more rapidly. As a crude measure of STUDENT'S 
relative skill, note that STUDEK? took less than one minute to solve the 
problem in Figure 17. 

global Informati on 

This algebra problem- solving system contains two programs *Mch pro- 
cess English input. One is the program thje far discussed, STUDEHT, *.nich 
accepts the statement of an algebra story arcblem and attempts to find the 
solution to the particular problem. STUDSST does not store any information, 
nor "remember" anything from problem to problem. 

The other program is called REMEMBER and it processes and stores facts 
not specific to any one problem. These £a=ts make up STUDEJlX's store of 
"global information". This information is accepted in a subset of English 
which overlaps but is different from the subset of English accepted by STUDENT. 
REMEMBER accepts statements in certain fixad formats. The following are 
the formats currently understood, and the method of storage of the informa- 
tion obtained in each format. 

1) Examples Distance equals speed times time. 
Format: PI equals P2. 
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Processing; The sentence i3 transforsced into an equation in the 
same way it is done in STJDSHt. This equation is stored on the property 
lists of the atoms which are the first wo::ds in each variable. In the 
example* the equation 

"(EQUAL (DISTANCE) (TIMES (SPEED! (TIKE)))" 
is stored on the property list of "DISTANCE", "SPEED" and "TxME". This 
equation will be retrieved if needed and cne of these words appears in 
the problem. 

2) Example: Tines is an operator oi level 1. 
Format: PI is an operator of level K. 

Processings A dictionary entry for ?1 is created with subscripts 
of 0? and E. For TXKSS, the dictionary er try (TIKES/OP 1) is created. 
These entries are those that are used to cetanaine the tagging of words by 
ftaiction. 

3) Example: OF is an operator. 
Format: Pi is an operator. 

processing j Creates a dictionary entry for PI with the subscript 
OP. The entry for OF is (OF/OP). 

4) Example: Bill is a person. 
Format; PI is a P2. 

Processing? Creates a dictionary entry for PI, with P2 as a sub- 
script.- The entry for BILL is (BILL/PERSCN) . 

5) Example: Feet is the plural of foot. 
Format: PI is the plural of P2. 

Processing: On the property list of PI, after the flag SING, P2 
is stored; on the property list of P2, after the flag PLURAL, the word Pi 

is stored. Thus FEET is stored after PLOTAL on the property list of the 
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atom FOOT. 

6) Example: One half, always means 0.5. 
Format; Pi always means P2. 

Processing; The. program STUDEST* is actually modified so that if 
Pi occurs, the mandatory substitution of P2 fox Pi will be made. The last 
sentence of this format processed by R3II3MBHR will be the first mandatory 
substitution made. Thus "one always means 1" followed by "one half always 
means 0.5" will cause the desired substitutions to be made; if these sen- 
tences were reversed nc cccuxrenee of "ona half" would ever be found since 
it would have been charged to "1 half 1 '. 

7) Example i Two numbers soraetiaee aeans one number and the other 
number. 

Format: PI sometimes means P2. 

Processings The STUDENT program is modified so that the possible 
idiomatic substitution of P2 for PI will oe made in a problem if it is 
otherwise unsolvable. All such "possible idiomatic substitutions" are 
tried when necessary, with the last one entered being the first one tried. 
These last two formats actually insert new METEOR program statements 
into STUDENT. If PI and P2 are METEOR left and right halves respectively, 
using special METEOR features, these formats can be used to extend the sub- 
set of English understood by STUDENT. 

Conclusion 

The STUDENT program accepts as input an algebra story problem couched 
in a restricted subset of English. It tries to solve this problem by map- 
ping the input into a relational model and manipulating structures in the 
model. It uses general information also entered in English, and states in 
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English any assumptions and facts used to find the solution to the problem 
given. If the solution is found, It is also communicated in English. 

STUDEST could be extended in many ways within the framework, developed. 
For example, much more sophisticated sentence parsing routines could be 
used. This would enable STUDHUI to extract "simple sentences" from much 
more complicated grammatical structures. In addition, it might help in 
the identification of a. phrase which Is a paraphrase of another. This 
identification is the most difficult task STUDENI attempts, and the methods 
used are heuristic and ad hoc . Better methods making more use of the mean- 
ing of the words in the phrases should be found. Because of the current 
method of identifying variables, a stored schema or equations can be used 
only once. This should ba improved. 

I think we are far from achieving a program which understands all of 
English. However, within its limited are? of competence, STUDENT has 
demonstrated that it ha3 a good "understanding" of English, and I think 
that limited programs such as this may actually prove to be useful tools 
in their own right. 
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